Between Bags and Trees - Constructional Patterns in Text Used for Attitude Identification

نویسندگان

  • Jussi Karlgren
  • Gunnar Eriksson
  • Magnus Sahlgren
  • Oscar Täckström
چکیده

This paper describes experiments to use non-terminological information to find attitudinal expressions in written English text. The experiments are based on an analysis of text with respect to not only the vocabulary of content terms present in it (which most other approaches use as a basis for analysis) but also with respect to presence of structural features of the text represented by constructional features (typically disregarded by most other analyses). In our analysis, following a construction grammar framework, structural features are treated as occurrences, similarly to the treatment of vocabulary features. The constructional features in play are chosen to potentially signify opinion but are not specific to negative or positive expressions. The framework is used to classify clauses, headlines, and sentences from three different shared collections of attitudinal data. We find that constructional features transfer well across different text collections and that the information couched in them integrates easily with a vocabulary based approach, yielding improvements in classification without complicating the application end of the processing framework. 1 Attitude Analysis is Mostly Based on Lexical Statistics Attitude analysis, opinion mining, or sentiment analysis, a subtask of information refinement from texts, has gained interest in recent years, both for its application potential and for the promise of shedding new light on hitherto unformalised aspects of human language usage: the expression of attitude, opinion, or sentiment is a quintessentially human activity. It is not explicitly conventionalised to the degree that many other aspects of language usage are. Most attempts to identify attitudinal expression in text have been based on lexical factors. Resources such as SentiWordNet, the Opinion Finder subjectivity lexicon, or the General Inquirer lexicon are utilised or developed by most research groups engaged in attitude analysis tasks [4, 18, 14]. But attitude is not a solely lexical matter. Expressions with identical or near-identical terms can be more or less attitudinal by virtue of their form (“He blew me off” vs. “He blew off”); 2 Between Bags and Trees combinations of fairly attitudinally loaded terms may lack attitudinal power (“He has the best result, we cannot fail him” vs. “This is the best coffee, we cannot fail with it”); certain terms considered neutral in typical language use can have strong attitudinal loading in certain discourses or certain times (“Fifth Avenue”, “9/11”). Our approach takes as its starting point the observation that lexical resources always are noisy, out of date, and most often suffer simultaneously from being both too specific and too general. Not only are lexical resources inherently somewhat unreliable or costly to maintain, but they do not cover all the possibilites of expression afforded by human linguistic behaviour. We believe that attitudinal expression in text is an excellent test case for general purpose approaches for processing of linguistic data. We have previously tested resource-thrifty approaches for annotation of textual materials, arguing that general purpose linguistic analysis together with appropriate background materials for training a general language model provide a more general, more portable, and more robust methodology for extracting information from text [10]. This paper reports a series of experiments to investigate the general effectiveness of structural features as carriers of information in text, applied to the task of attitude analysis. 2 Constructions as Characteristic Features of Utterances Our hypothesis is that investigating utterances for presence of content-bearing words may be useful for identifying attitudinal expressions, but that finding structural features carries over easier from one topical area to another, from one discourse to another. It has previously been suggested that attitude in text is carried by dependencies among words, rather than by keywords, cue phrases, or high-frequency words [1]. We agree, but in contrast with previous work, we explicitly incorporate constructions in our knowledge representation, not as relations between terms but as features in their own right, following a construction grammar framework [9, 3]. Our claim is that the pattern of an utterance is a feature with the same ontological status as the terms that occur in the utterance: constructional features and lexical features both have conceptual meaning. Patterns are part of the signal, not incidental to it. This claim, operationalised for experimental purposes, gives us a convenient processing model. Where the step from bag-of-words analyses to complete parse trees is both computationally daunting and brittle in face of fluid and changing data, we can within a constructional framework find middle ground: we use observations of pattern occurrences as features similarly to how we use observations of word occurrences. An utterance will then not only be characterised as being a container for a number of words, but also a container for some observed patterns. Some previous approaches for using syntactic analys in large-scale text analysis have used segments of parse trees rather than the entire tree; however, the distinction between lexical features indicating content Between Bags and Trees 3 Tense shift It is this, I think, that commentators mean mean when they say glibly That the “world changed” after Sept 11. Time adverbial In Bishkek, they agreed to an informal meeting later this year, most likely to be held in Russia. Object clause China could use the test as a political signal to show the US that it is a rising nuclear power at this tense moment. Verb chain “Money could be earned by selling recycled first-run fuel and separated products which retain over 50 per cent of unused uranium,” Interfax news agency reported him as saying.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Free-Drawing from Memory in Constructional Apraxia: A Case Series

Constructional apraxia is a neuropsychological and neurological impairment in activities such as building, assembling, and drawing. In this study data is presented on the performance of left and right hemisphere single focal stroke lesion participants on drawing tasks of houses, trees, and persons. Forty-one participants completed a comprehensive neuropsychological battery including the house-t...

متن کامل

Predicting The Type of Malaria Using Classification and Regression Decision Trees

Predicting The Type of Malaria Using Classification and Regression Decision Trees Maryam Ashoori1 *, Fatemeh Hamzavi2 1School of Technical and Engineering, Higher Educational Complex of Saravan, Saravan, Iran 2School of Agriculture, Higher Educational Complex of Saravan, Saravan, Iran Abstract Background: Malaria is an infectious disease infecting 200 - 300 million people annually. Environme...

متن کامل

Isolation, Identification and Antimicrobial Resistance Patterns of E. coli Isolated from Chicken Flocks

Fifty E. coli strains isolated from chicken flocks were analyzed to determine their resistance to antimicrobial agents used in Tehran poultry industry. By using Mast Diagnostic kit only O6 serotype was identified. Multiple resistance to antibiotics was observed in all isolates. The highest rate of resistance was against Tetracycline (94%), followed by Rifampicin (90%), and Oxytetracycline (80%)...

متن کامل

Identification of Nontuberculous Mycobacteria Species Isolated from Water Samples Using Phenotypic and Molecular Methods and Determination of their Antibiotic Resistance Patterns by E- Test Method, in Isfahan, Iran

Introduction Many studies have shown epidemiological links between strains isolated in tap water, and those isolated from patients. Molecular methods linked to PCR are more reliable and faster for identification of             non- tuberculous mycobacteria(NTM). In this study molecular methods were used for identification and typing of NTM. Materials and Methods Five hundred ml of 85 water ...

متن کامل

Author gender identification from text using Bayesian Random Forest

Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010